Bayesian Large-scale Multiple Regression with Summary Statistics from Genome-wide Association Studies.
نویسندگان
چکیده
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a "Regression with Summary Statistics" (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.
منابع مشابه
Supplement to “bayesian Large-scale Multiple Regression with Summary Statistics from Genome-wide Association Studies”
A Theoretical derivation of RSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.1 Proofs of propositions in main text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.1.1 Proof of Proposition 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.1.2 Proof of Proposition 2.2 . . . . . . . . . . . . . . ....
متن کاملAnalysis of genome-wide association data by large-scale Bayesian logistic regression
Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, tra...
متن کاملJAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects
Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of margin...
متن کاملBayesian Variable Selection Regression for Genome - Wide Association Studies , and Other Large - Scale Problems
We consider applying Bayesian Variable Selection Regression, or BVSR, to genome-wide association studies, and similar large-scale regression problems. Currently, typical genome-wide association studies measure hundreds of thousands, or millions, of genetic variants (SNPs), in thousands or tens of thousands of individuals, and attempt to identify regions harboring SNPs that affect some phenotype...
متن کاملA pathway analysis method for genome-wide association studies.
For genome-wide association studies, we propose a new method for identifying significant biological pathways. In this approach, we aggregate data across single-nucleotide polymorphisms to obtain summary measures at the gene level. We then use a hierarchical Bayesian model, which takes the gene-level summary measures as data, in order to evaluate the relevance of each pathway to an outcome of in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- The annals of applied statistics
دوره 11 3 شماره
صفحات -
تاریخ انتشار 2017